perm filename TOHOKU[S87,JMC]2 blob
sn#843027 filedate 1987-07-19 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00005 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 This lecture is on giving computers common sense.
C00029 00003 Thus what we want to talk about is a notion of minimizing the truth of $P$,
C00036 00004 Let me give
C00049 00005
C00067 ENDMK
C⊗;
This lecture is on giving computers common sense.
I have been interested in this problem since the 1950s,
and gave a paper on it in 1958, when my hair was as black as yours.
I still haven't quite solved it.
People keep trying,
but it may turn out to be a very difficult scientific problem
to understand what is involved in common sense
well enough to make computers carry it out.
Immediately we come to a controversy, because
there are two ways of looking at common sense.
One way is to say,
"Well, common sense is a human property,
and therefore understanding it is a branch of psychology."
The other way to look at is to say that common sense refers to
certain kinds of information situations,
to the ability to achieve goals with certain information
and lacking other information,
in which case it's a branch of computer science rather than a branch of
psychology.
Naturally, the problem is big enough so that there is room for everybody
to try to do the problem in his own way,
but let me say that I take the second view of it,
and regard the common sense problem in particular,
and artificial intelligence in general,
as a problem of computer science --- as
devising means whereby with certain kinds of information
one can solve problems.
Alreadey in 1958 it was still clear that
it was not going to be so easy to define common sense
or to define intelligence.
But in any case I made the statement in the paper, "Program of Common Sense",
arising from the facts of a particular problem and its general knowledge.
that a program of the common sense problem has sufficiently
obvious consequences of what it is told and its general knowledge.
This gives rise to some questions:
namely what general knowledge does a person have
that is available for use in solving problems
and what are obvious consequences.
Let me discuss a little bit about what is obvious and what is not.
I'll tell you a little story.
A few weeks ago,
I wanted to go to a meeting and I was late to this meeting.
I drove my car to the parking lot,
and I was carring a rather heavy attache case.
The question was how to get to this meeting on time.
Driving the car there was not very feasible, because
there probably was not going to be any place to park.
So there were several possibilities:
one was to walk there with the heavy case.
Another one was to go to my office,
which was on the way, and leave the case there,
and then go to the place of the meeting.
Another was to use my bicycle which
was in a bicycle locker near my office.
I said,
"Well, I don't want to carry the brief case on the bicycle,
I am not good at that.
So maybe I'll have to walk to my office
and then walk to the bicycle".
Then the solution came.
And I claim that even though the solution is easy,
and any of you might get it, it's still not obvious.
The solution was to walk to the bicycle locker,
put the brief case into the bicycle locker and take the bicycle.
Now what I want to argue is that the solution is not obvious.
Moreover, if you make a computer program with common sense,
then you will have to make it so that that solution is not obvious.
The bicycle locker is a tool,
and the way in which a system that has common sense should treat a tool
is that a tool should be used for its usual purpose.
The usual purpose of the bicycle locker is to store the bicycle,
not to store something else.
So the point of this little story is that
when we want to design a system that has common sense,
we don't want it to explore all the possible actions
that it has available to it,
because while the action of using this particular thing in this particular way,
using the bicycle locker to store the brief case, was in fact correct,
there were many other possibilities of actions that could be performed,
like standing on the brief case,
or standing on the bicycle locker, or carrying the bicycle.
All those make no sense, but they would conceivably be explored.
That digression actually probably should have been later
in the lecture, but I wanted to make it while I thought about it.
Something that came along fairly early in thinking about common sense
was trying to use mathematical logic as the language in which the computer
program would represent what it knew about the world,
both its general knowledge
about the world and the knowledge of a particular situation.
Now this is not an inevitable decision.
In fact it's a controversial idea.
There has been controversy, the declarative versus procedural controversy,
as to whether knowledge should be represented by computer program or whether
it should be represented by facts in the computer memory.
In the long run, the answer is both,
and even to represent the same information in both ways.
In fact today's AI programs are not able to make substantial
declarative use of information that is also procedurally represented.
But it is interesting that when one person is instructing another person
on how to do something, almost all of the information that is
transmitted is declarative, i.e. facts about the situation.
What you tell the other person does not correspond to telling him
a computer program.
For communication declarative sentences have very important
advantages, the largest of which is that it is not necessary
to understand another person's internal state, or internal workings
in order to tell him facts, whereas if you imagine that he had a
program and you wanted to debug his program, this would be like
trying to do education by brain surgery, which fortunately we
don't know how to do.
The idea of using logic to describe facts about the common sense
world is old; it goes back to Leibniz, who
wanted it to be able to replace argument in controversies
over human affairs by calculation.
Leibniz did not get very far with his project.
He did not even invent propositional calculus, and that requires
some remark, because propositional calculus is much simpler than
the infinitesimal calculus of which Leibniz was a co-inventor with Newton.
So he solved what was technically a harder problem and didn't solve
the easy problem. The reason presumably is that the concepts were
somewhat confusing, and what I will say is that the concepts are still
somewhat confusing, although we have made a lot of progress.
Probably fifty years from now or maybe a hundred years from now,
people will look back and say, "How could these people in the 1980s
have missed what is so obvious to us."
Leibniz had the idea of 0 and 1 as truth
values,
but he missed the fact that the operations on these truth values should be
quite different than the arithmetic operations in the binary number system.
Boole, who did develop the propositional calculus, called his
book 'The Laws of Thought', and he wanted to apply it to reasoning in
general. Frege had some such ideas also, but after Frege,
it seems that mostly the mathematical logicians said,
"Well, we can do the foundations of mathematics", and had great
success in applying mathematical logic to the foundations of mathematics,
and this success in one area caused them to neglect trying to apply it
to common sense reasoning. All that may be just to say that the subject
turned out to be a difficult one, and it might take quite a long time.
Now, as to what people have been trying to do in representing the facts
about the common sense world, I would say there are really two things.
One of them can be described as is-a hierarchies, that is as in taxonomy
where you have it that a dog is an animal and a St. Bernard is a
a dog, and one considers the properties that are inherited down this
hierarchy and inheritance with exceptions. That has certainly had quite
a lot of use in AI. If one looks at systems, for example LISP-like flavors,
the purpose of this is mainly to handle inheritance, and whether it's
to be done by logic or some other way is an interesting question. Anyway
there are lots of ways of doing inheritance.
The second area in which there has been a lot of work is in formalisms
that describe the consequences of actions. The most straightforward
way to treat describing the consequences of actions is to write a program
that will simulate the consequences of actions. If you have clearly
defined situations, if you have complete information about the effects of
an action, or are willing to use a model which assumes complete
information about the effects of an action, then this works pretty well.
However, it doesn't work so well if you only have partial information about
a situation and only partial information about the effects of action.
You can then no longer simulate because all you can determine are some facts
about the effects of the actions, some facts about the situation which
results from the action rather than the complete state of affairs.
But also apart from simulation, there are many other things that you want to
know about the effects of actions. For example, you may want to know that
no sequence of actions from a certain set will achieve a goal under
conditions in which an exhaustive search of the possible actions is impossible.
But you may know, for example, that all the actions that can be performed
leave a certain quantity constant, and then if you know that's true,
then you know that no sequence of actions will change this quantity.
To take a trivial example, if I wanted to be on the ceiling
of this room, we know that no sequence of steps of walking will achieve that
result. We don't need to explore the consequences of
individual sequences of steps.
We also use facts about the consequences of actions
to reason about the past. If I see something, then I can infer something
about the past. Someone brought in this projector; probably it wasn't here
last night.
Thus we need more facts about the consequences of actions than can be obtained
by running a simulation program.
In 1969, Pat Hayes and I wrote a paper including something called the situation
calculus, which is a particularly simple formalism for treating the
consequences of actions. It limits itself to actions that have
discrete results. It doesn't
treat a continuous action. It doesn't treat concurrent actions, that is,
when several different people are at the same time doing things.
Some things can be formalized reasonably in the situation calculus.
Moreover, some of the more applied AI systems use formalisms that are
in some sense derivative from the situation calculus.
In the 1970s, there was progress of an interesting kind. Namely, it was
discovered that the normal modes of reasoning in mathematical logic
needed to be supplemented by non-monotonic reasoning.
That's sort of interesting because the first person maybe to raise this issue
might have been Minsky. But he raised it in a kind of negative context;
namely, he was against the use of logic. Minsky in general doesn't like
the use of logic in AI, even though some of his best students have used
it. So, he said something like "Well, it can't do monotonic reasoning,"
and he gave the example of "What are you going to put in your database
about birds flying? Are you going to put in all the exceptions, that dead
birds can't fly and birds with their feet in concrete can't fly, and so
forth?" And no matter how many things you put in your conditions,
Minsky was ready to invent more things than you had taken into account.
Around the same time, some other people, myself
included, were coming to similar problems, but
from a positive point of view. Namely,
"let us add some
methods of reasonings to mathematical logic that will
make it work for non-montonic reasoning."
Let me put down a formula that has to do
with this question of non-monotonic. (I always forget to
put this formula down, but it is quite important.)
The logicians use the following notation: They write
%
$$A\vdash p,$$
%
read ``$A$ infers $p$'', to mean that if $A$ is some set of sentences,
and $p$ is a sentence, then there is a proof of $p$ from $A$.
So, given the facts, $A$ being a collection of facts,
you can prove $p$ from that collection of facts. Now
suppose that $A$ is contained in $B$, that $B$ is some larger
collection of facts that includes all of the facts in $A$.
Then it is going to be true, in just about any mathematical
logical system that has an ordinary notion of proof, that
$p$ is provable from $B$. The reason is trivial; namely if you have
a proof of $p$ from $A$, then exactly the same proof will work
as a proof of $p$ from $B$.
The logicians summarize this using the schema
%
$$A\vdash p$$
$$A ⊂ B$$
$$B\vdash p.$$
%
However, ordinary human reasoning is often
non-monotonic in the sense of the following
example. Suppose that you know that I own a car and that
we are going somewhere, then you might think that perhaps
I will give you a ride. But when you learn that my car is in
California, then you no longer reach that conclusion. So, an
increase in facts causes a reduction in the conclusions.
Now, what we need to investigate there is not so much why you
took it back, why you no longer make the conclusion, but how
did you reach the conclusion in the first place? And the way
you reached the conclusion in the first place is a matter of
argument, but let me tell you my view of it. In the more or
less standard model, or standard view of what you can expect
when someone owns a car, is that the car is accessible to him,
and therefore he can give someone a ride in the car.
And we make this reasoning from the standard model all the time.
Now, the question is how do we formalize this logically.
Several ways have been proposed.
Three well-known ways were proposed in three papers that appeared
in the magazine "Artificial Intelligence" in 1980.
One of them was called "Non-monotonic Logic", by
Drew McDermott and John Doyle, and another one was called "A
Logic of Defaults", by Raymond Reiter,
and the third one was called "Circumscription", by me.
I'm going to talk about circumscription as a means of non-monotonic reasoning.
Circumscription has had a number of different forms
because, after it has been proposed in one form,
applying it to new common sense reasoning problems,
leads to the discovery
that it needs to be changed and generalized.
This process has not concluded yet.
Here is one of the more straight-forward forms of circumscription.
We introduce a notion of ordering of predicates.
Let $P$ and $P'$ be predicates on the same domain.
We define
%
$$P ≤ P' ≡ (∀x)(P(x) ⊃ P'(x)).$$
%
What the $≤$ amounts to there is that $P$ is,
so to speak, no more true than $P'$, i.e.
it's true for a subset of $x$'s.
Thus what we want to talk about is a notion of minimizing the truth of $P$,
subject to some condition. Of course if we don't have any conditions,
we can minimize the truth of $P$ by making it always false. Therefore,
we need to talk about constrained minimization. Here we have
the formula called "circumscription": $A$ is the condition that has to
be satisfied. $P$ is what you want to minimize, $Z$ is some set of things
that you are allowed to vary in order to achieve the minimization.
They will include of course $P$ itself, but you will want to vary
these things. Then if we say that we want to minimize $P$ according to this
condition, then it's easy enough to see the formula we should write down.
First of all, $P$ and $Z$ must satisfy this condition.
And secondly, there should not be any alternate $P'$ and $Z'$ that have the
property. They also satisfy the condition that $P'$ is less than $P$. ``Less
than'' means in this sense, relating it to this formula,
less than equal to, but not equal.
So $P'$ has to be different from $P$,
and it has also to satisfy this condition here.
That is the "less than" symbol.
So this is the form of constrained minimization.
Now, actually, I've written this down in a simple form.
There are some generalizations which are important in practice
but which aren't conceptually different.
In particular, I've written it as though $P$ was the predicate of one variable,
but the same thing works if $P$ is a
predicate of several variables. I've written it as if
there were only one $P$ that you wish to minimize, but
it can be handled in a similar way if there are several $P$'s that
you wish to minimize, even if they have different numbers of arguments.
I've written it as though there were only one thing
$Z$ that you could vary besides $P$, but there could be many
of them. The only result of these generalizations is that
the formula's longer. Sparing you the longer formula
is all that I've accomplished by this.
I want to make this
more concrete by giving a particular kind of use of circumscription,
a particular kind of application,
and maybe before I do that, I should make a few further
remarks. It turns out that there is a fairly substantial
mathematical theory of this form of logical minimization that can
be developed. The first remark to make is that this formula is
a formula of second order logic, because this "there exists"
here refers to predicates. If you study AI, you know that there are
quite a few
theorem provers and problem solvers that manipulate first order
logic, and that second order logic is considerably more difficult
to handle. Well, it turns out that if you are interested in doing the
mathematics of this, there are a large number of cases in which the
second order formula collapses to a first order formula, so
you can apply your conventional methods after all.
There is a theory -- thanks to
Vladimir Lifschitz, Michael Gelfond, and still others --- of the
collapsible cases of circumscription.
The second point is that if you are
a mathematician and you see this engineer,(that is what the mathematicians
call me), talking about the minimum, then you get suspicious and you say,
``how do I know this minimum exists?'' And the answer is, ``Well it
doesn't always exist,'' and so there is a theory giving sufficient
conditions under which you can guarantee that a minimum
exists and then some examples where these conditions are not met,
in which you can show that a minimum does not exist.
There is a possibility of considerable
mathematical theory, of logical minimization.
And in fact there are some logical problems, like
generalized induction, that have been studied by logicians for some time,
are examples of logical minimization. In the theory
of programming, also, one talks about least fixed points.
However, the AI applications present somewhat different mathematical problems
from the ones the logicians have already studied.
Let me give
an application to a problem that Minsky posed in the early 1970s.
The first two formulas, there, are in fact copies of one
another, the second one is the first one abbreviated in a way
I like to abbreviate things.
Now I've been
careful to write certain things in black and certain things in red,
and first, let's look at the black thing, the black part.
If you look at this, it says that "for all X, not flies X".
So, looked at in this way, it says that things don't fly.
Looked at with the red in there it says, they don't fly unless
they are abnormal in aspect one. So, what you want to do is to write
down essentially what is generally true but qualified by some
abnormality. And now one of the pitfalls, it seems to me in thinking
about this, was that people oversimplified in thinking about it some time ago,
and thought about things like ``a normal bird flies.''
That's not quite right. The way you want to think about it is that
the bird that is normal in aspect one flies, because if you want to say
that a normal bird has feathers, but for some reason you know
this bird doesn't fly, you still want to be able to conclude, unless
you have information to the contrary, that it has feathers.
If you simply say it's an abnormal bird, then you don't know anything about
it at all. So that is where these aspects come in. So this "ab" stands
for abnormal in aspect one.
The next thing we say is that birds are abnormal in aspect one.
This cancels the effect of this rule as far as birds are concerned.
I call it a ``cancellation of inheritance'' axiom. It doesn't say that a bird
can't fly, but it certainly tells you that you can't use
this rule to say that it can't.
If you only look
at the black part of the next formula, it says birds fly. If you include the
red, it says that a bird that is not abnormal in aspect two flies. Then we
can go on and say penguins are abnormal in aspect two and to say that a
penguin that is not abnormal in aspect three doesn't fly, and so forth.
What you want to do in order to apply the method of circumscription is to
take the conjunction of all these axioms with the quantifiers attached, (I
left off the quantifiers except on the first one but they have to be
attached there.) and then minimize "ab". You want to circumscribe the
predicate ab, so you want to have as few things abnormal as possible.
That's the intent of this kind of theory. It's called "a simple
abnormality theory". That has proved to
be a moderately popular idea. Some people who don't like
circumscription do like "ab"s and use abnormalities in some other
nonmonotonic formalisms.
The other thing that you have to decide before you can use
circumscription is what's allowed to be varied.
One thing that must be allowed to be varied in achieving the minimization
of abnormality is the predicate "flies", because after all the whole
point of the exercise is to make some decisions about what flies and
what doesn't fly. So that's to be varied. Now you get a nice result
here if you say that's all that varies. In other words, whether
something is a bird or not is supposed to be known and definite.
Whether something is a penguin is supposed to be known and definite.
If you do it that way, then you get the result that as far as these facts
are concerned --- in the standard model of these facts, the things
that fly are those birds that are not penguins.
You might regard that as too strong a conclusion.
In that case you could elaborate these axioms by introducing the notion
of what objects are present in a given situation.
If you know that some birds are present, and some of them are penguins,
then you can apply this axiom, and you will say the objects that fly
in this situation are the things, that are birds and not penguins.
The nice thing about this set of axioms is that you don't have
to change them to add additional facts. Suppose we want to say that
airplanes also fly. Then we have an axiom which is like this one, which
says that airplanes are abnormal in aspect one, and an axiom which is
like this one that says if something is an airplane and it's not abnormal
in aspect seven, say, then it flies.
I don't know whether you should reuse aspect two or whether you need
a new one. I think it doesn't matter.
Conversely, if you want to say that ostriches are abnormal in aspect 2,
and that ostriches in general don't fly, you can add that without any
problem. And this at least partially solved Minsky's problem, because
as fast as he thinks of these exceptions you can write them down
without having to change anything that you already have.
Or as fast as exceptions occur, you can write them down.
Now actually this solution is only partially satisfactory.
The reason why it's somewhat satisfactory is if you are making
some kind of AI system, you may be able to work under conditions
in which it is definite as to what are the birds and what are the
penguins, and so forth and so on. But it is only partially satisfactory
because someone may also say, "Well I don't know what are penguins and
what are birds in this situation?" And if you do the circumscription on
these facts, making everything variable.
By the way, I'm not talking about this fact about canaries.
I want to leave that out for the time being. If you circumscribe these
facts altogether, then you get an undesirable result.
Nothing is abnormal at all. Nothing flies.
There are no birds, there are no penguins, and in fact in some sense
these are no objects. So letting everything vary is too much unles
there are mowe axioms.
You could say, "Well, but in a particular situation, we may have some
more facts, that Joe is a penguin and Tweety is known to be a bird.
And then you have to introduce some more abnormalities, to say it's abnormal
in some aspect to be a penguin. If you do that, you can still make this
example work. But then you come to this one, which is to say that there
is some other sense of the word canary, so that not all canaries are birds.
And in American slang, there is another sense of canary.
Canary in the 1930's in American movies anyway
was a criminal who confessed. "He sang like a canary",
was the saying at that time.
What you would like is to say, "We will provide for that,
we will provide for a canary that it is not a bird,
but we'll just say those are abnormal in aspect four
and rely on circumscription to get rid of them.
In the present formalism, that doesn't quite work,
because if you know that Joe is a canary,
what you would like to have come out as a result of your
non-monotonic reasoning is that, since you don't know anything else,
Joe is a bird and therefore flies.
What you will get when you do the circumscription are two possibilities;
one of them is that, and the other one is that Joe is not a canary
and does not fly and furthermore is not abnormal in aspect one.
So when you minimize ab, you get an unintended solution.
What to do about these unintended solutions in something
on which people have not yet come to an agreement.
There are a number of possibilities. However,
the canary problem is not one which people have given
most of their attention to. That is something called
"The Yale Shooting", which I'll come to a little later.
I'll come back to this problem
in connection with another example. I want to talk
a little bit about actions.
There are entities called situations
that are considered to be states of affairs. We
have events, and we have a new situation that results
from an event in an old situation.
Thus the result of
an event in a situation is a new situation.
We write
$$s' = result(e,s).$$
The most common events that we'll be interested
in are actions, so we are interested in the new
situation that results from an action. We also have
fluents, which are kinds of things like functions --
they have values. We have the common sense law of
inertia. The common sense law of inertia takes
the form that unless you know something particular about an
event and a fluent, then the event doesn't change this
fluent. So this says that the value
of this fluent in the result of the event is the same as the
value it had before, unless there is something
abnormal about aspect 1 of the fluent, the event
of the situation.
Now we specialize this to the block's world.
In this, there are some kinds of objects, which
are blocks which I'll denote by K as a variable;
there'll be some locations which I'll denote by L as a
variable, and colors which I'll denote by C as a
variable. We have some axioms concerning
two kinds of actions: one is moving
a block and another is painting a block.
This particular form of axiomatization is designed
to solve a problem which came up in previous
formalizations, and this problem is called "the frame
problem." "The frame problem" is that you need
to be able to say that moving a block doesn't
move other blocks, unless you have some positive
information that it does, and painting a block
doesn't move it, and moving it doesn't change its
color, and so forth. But you want to avoid
actually having to say these things explicitly,
because you can do better than that; you can do without it,
and in particular, humans do without it.
Here are these two axioms that work in
this case. What they say is that unless something
is abnormal about a block in the location in the
situation, then the value of the location of that block,
in the result of moving it to the
location, is the location. And similarly, unless
something is abnormal, unless we have an aspect three
which is abnormal, then the value of its color
in the situation that results from painting it
in color C is C.
Then we can put in some things with preconditions
for this moving. For example, we can say
that the top of something should be clear; that is,
there should not be something located on top of this block.
And the place you want to move it should
be clear. And there is also a restriction that prevents
this rule from letting you move an object to its
own top. If any of these things is true,
then it's abnormal in aspect two and you can't conclude
that it can be moved. However, the same
old unintended model problem
arises. This one was discovered by Vladimir
Lifschitz, and it occurs when we attempt to
calculate what's the value of location B in the
situation that results from moving B to the
top of C, in the situation that results from moving A
to the top of B, and is its location on top of C.
In the intended model, the answer is No,
or at least you can't conclude it, because
when you move A onto B, then B no longer has a
clear top, and so therefore, our rules do not permit
us to conclude that a move from B onto the top of C is
successful. That's the intended model.
The unintended model arises in the following
way. We're minimizing ab in this whole
thing, and another minimum turns out to be
that for a totally unknown reason, this move
was unsuccessful, and therefore there is
nothing on the top of B in the second situation,
so B can be moved to the top of C. That's the
"unintended model."
There are various ways of getting rid of
this unintended model. Vladimir Lifschitz
invented two of them. It is not clear
which is to be preferred. Which is to be
preferred is determined by which will lead to a more
general theory; that is, mere solving of this
particular problem is not sufficient. What
I want to show you is one of Lifschitz's solutions,
but applied to a simpler problem,
namely the "Yale shooting problem."
This is a problem proposed by Drew McDermot
of Yale. These axioms certainly do not
describe the problem, so I'd better describe it.
The problem is again one of getting rid of an
unintended model. The actions are roughly
the following: You have somebody, I'll call him Fred,
who's alive in an initial situation $S_0$.
There is a gun. The gun is loaded. In this formalism,
you don't need to say by whom. And there is a
wait about which we have no positive information
concerning what effects of a wait there might be. And
then the gun is shot. And so we have the following
facts about the initial situation.
The proposition "alive," which actually could have Fred as a parameter
but the parameter is completely inactive, so you might as well
leave it out, holds alive at $S_0$. Fred is alive at $S_0$.
The gun is not loaded at $S_0$. We have the following general facts
about the effects of actions. One is that the loading causes the gun
to be loaded, or causes the truth value of the gun being
loaded to be true.
And shooting causes "loaded" to be false, and shooting causes "alive"
to be false. But shooting has a precondition, and that is that the
gun be loaded. These are the facts of the initial situation,
and these are the general facts about the effects of actions.
Then, this theory has some general axioms. That is, axioms
which are independent of the problem of shooting Fred.
The first two things are just definitions. This is intended only
to make the followings formula shorter and to introduce some
correspondence between the formal terms of this theory and ordinary
experience. The success of an action in a situation is simply
that all of its preconditions are satisfied, if every fluent F
which is a precondition holds in this situation.
Secondly, the second definition is that an action affects a fluent
in a situation if
(a)
it succeeds, and
(b)
if there is some value
that it causes the fluent to have.
Now a little bit about this
predicate "causes." If you were a philosopher you would say,
"How does he define causes? We philosophers have argued about
that for more than 2000 years, and which of us does he believe as to
what the definition of causality is ?" And the answer is "none of
the above;" namely, we are introducing "causes" in this formalism
as an undefined term, and mainly it corresponds to the current human
usage, or even less likely, corresponds to a current philosophical
usage. Maybe it doesn't. But the hope is that it is going to be useful
for AI purposes.
So we have these two definitions, and then we have an axiom about
the effects of success, which is really analogous to the
previous axioms that I had about if the action of moving a block
to a location L was successful, then it is in the location of L.
This says success of A and S and "causes A" (A causes the fluent
to have a certain value) implies that the fluent has that value.
Actually there are specialized fluents whose values are true and false,
although later in the same paper Lifschitz generalizes it. But since I
want to get all the formulas of Lifschitz's things on one sheet of
paper, or on the slide, I did this one rather than the other.
The other one is the block's one.
The last one is the common sense law of inertia, which says that
if an action doesn't affect the fluent, then the fluent holds
in the resulting situation if and only if it held before. So, it
has the same value as it had before. You can take my word, or rather
take Lifschitz's word for the fact that he has proved that if you use
these axioms, and you circumscribe the two predicates, causes and
precond (that's why I made them red; so they'd play the role of ab)
in the axioms, then you will only get one model. You will get the
model that in the situation we are talking about.
We are talking about "result" of "shoot" and result of "wait" and
result of "load" at $S_0$, and maybe we will even say "not" holds
"alive" in this situation. So that's what Lifschitz can prove;
if you minimize causes and precond, then you will be able to
prove this thing, and the unintended model that you will get
from the simple abnormality theory doesn't occur. Now maybe that's
all the formulas that I want to discuss. That was perhaps too many
anyway.
Let me again say a little bit about the state of formalizing common sense.
In spite of the fact that it isn't yet decided which are the best
ways of getting rid of this kind of unintended model, I believe
this problem will be solved fairly satisfactorily in the next few
years. There are a large number of proposals for solving it. Then,
let me say just a little bit about some of them. One of McDermot's
students, Shoham, has a notion that he calls "chronological
minimization", which can be expressed by a logical formula. It
says that if something is going to be abnormal, if we have a
choice as to what's going to be abnormal, we would like the
abnormality to occur as late as possible. That's good enough for a
number of problems, especially for predicting the effects of
actions, but what it comes from is saying we know about the future only
from our reasoning about the events that occur in the present, and
if we want to know what occurs two days from now, then basically
we know about it because we reason about what happens tomorrow,
and from that we reason about what will be the case two days from now.
So this is his chronological minimization. It will work for a lot of
AI problems. It has some limitations. The limitations perhaps arise
if you want to reason about the past from the present.
Then what you know is about the present, and you want to reason about
the day before yesterday via yesterday. The situation isn't entirely
analogous with these reasonings although I've put it in a parallel form.
What it suggests is that, instead of trying to do what I call a
fully objective theory, in which we make our minimizations concerned
with what's in the world, we attempt a more subjective theory
in which you have the following character. If we want to know
about Z, and all we know about Z is derived from what we found out
about Y, and all we know about Y is what we found out about X,
then we should prefer abnormalities in Z to abnormalities in Y.
Shoham has got some kind of formalization of this (except that
apparently as yet nobody else understands it) which he calls
"chronological ignorance." That's a sort of another approach to it,
but because that approach involves to some extent formalizing
not merely facts about the world, but facts about knowledge, it's
more difficult than the other. So, I hope that I have given you some idea
of the goals and the present state of the efforts to give
computer programs common sense.